Montana State University - Statistical Consulting and Research Services Logo

Montana State University
Statistical Consulting and Research Services

montana.edu/statisticalconsulting

Spatial and Temporal Considerations of Flower Occurance at the Northern Agricultural Research Center Near Havre Montana (DRAFT)

Prepared for Ermatinger Lochlin

This document may be subject to further revision and is not a final report.

Will Hammond and Gifty Osei, Lead Statistician (Statistical Consulting and Research Services)
Riley Collins, Secondary Statistician (Statistical Consulting and Research Services)
Andy Hoegh, Director (Instructor for Statistical Consulting Class)

May 02, 2023

Author Note:

This material is provided to communicate advice from SCRS statisticians based on our best understanding of your research needs. We encourage you to use this report in discussions with colleagues. Please do not publish any portion of this material without permission.

© Primary Statistician

When you make use of our work for publications or presentations, please be sure to acknowledge the funding we receive from NIGMS using the following: “Research reported in this publication was supported by Institutional Development Awards (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under Awards P20GM103474, U54GM115371, 5P20GM104417, and 2U54GM104944-06. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

1 Introduction

Montana’s most important region of wheat production is colloquially referred to as the Golden Triangle(GT). The area owes its agricultural prominence to the grasslands that they were converted from. It has been estimated that 73\(\%\) of the grassland that were present in the GT prior to European settlement of region have been lost to land conversion. In the case of wheat production, the balance between an insect pest of wheat (the wheat stem sawfly) and its natural enemies has been disrupted, resulting in epidemic outbreaks of wheat steam sawfly(WSS) infestation in wheat fields.The specialist parasitoids B. cephi and B. lissogaster require a WSS larvae feeding inside a grass or wheat stem as host for their offspring. They also require diets rich in carbohydrates and other nutrients compounds to achieve maximum livelihood and increase traits associated with fecundity. Because little is known about the floral resources that may be important to the parasitoids, a two year survey was conducted of an intact grassland plot(12 ha) located within the GT. During each survey(~6 per season, every two weeks) the location, species, and quantity of actively flowering plants encountered were recorded. LiDAR digital elevation model and daily weather observations to model the spatiotemporal context of our observed flowering plants were also retrieved. This model may help us to understand which species of flora warrant further investigation as supplemental resources for the specialist parasitoids of WSS based on their temporal alignment with the braconids’s phenology and observed spatial distribution. Anticipating the culmination of this project will help inform future research on providing supplemental nutrition to parasitoids of WSS in the field setting. Identifying plants native to the study region that may provide the parasitoids with similar benefits to that of buck wheat and cowpea is the big goal.

1.1 Data Collection

This research describes surveys of flowering plants conducted in a 12-hectare grassland on the northern Agricultural Research Center in Montana. The site has been protected from development and livestock grazing since 1916, allowing for the present flora to be similar to what may inhabit analogous areas in the absence of human influence. The surveys were conducted every two weeks between June and September of 2019 and 2020 using belt transects, and the surveys consisted of GPS location, species, and quantity of actively flowering plants. In total, 9720 point observations of actively flowering plants from 61 different species were collected.

2 Data Description and Methods

This analysis utilized data from three primary sources: a list of flower species found on the plot of land, weather data for a nearby station, and data collected at numerous points on the plot. The flower species list contained 65 species, their scientific and common names, and a conservation score. Weather data for each day in 2019 and 2020 included mean air and soil temperature and precipitation received, used to calculate growing degree days. Data collected at numerous points on the plot included flower species and quantity, as well as variables such as x and y coordinates, mean elevation, aspect, slope, and roughness. Aspect was transformed into an indicator for north facing slopes.

Plot of flower observations across the survey area. Colors correspond to individual species.

Figure 2.1: Plot of flower observations across the survey area. Colors correspond to individual species.

2.1 Poisson Point Process Models

An inhomogenous poisson point process model was used to glean insights to the distribution of species in terms of favorable or unfavorable terrain characteristics. The approach used was to fit an individual model for each species. (Banerjee, Carlin, & Gelfand, 2014) (Baddeley, Rubak, & Turner, 2015) These models are based on two main assumptions:

  1. Observed points are independent of each other. That is, the presence of one point in a region, \(B \subset \mathbb{R}^2\), is not related to the presence of other points in the same region.

  2. The number of points (flower observations) within a region,B, has a poisson distribution, and the expected count of points within a region is = \(\int_{region}\lambda(u)du\), where \(\lambda(u)\) is the intensity (average number of points per unit area) of points at the location \(u \in B\).

Each model produces an intensity surface across the survey area according to log-linear poisson likelihood, where the points \(u\) correspond to pixels in the survey area.

\[P(N(u)) \sim Pois(\lambda(u))\]

To estimate the intensity with respect to the terrain covariates we use a log-linear intensity function:

\[ \lambda(u)=exp(\beta_0 + \beta_1aspect_u + \beta_2slope_u + \beta_3roughness_u) \]

Here, the log-linear intensity function imposes another assumption, that intensity increases or decreases exponentially as a function of aspect, slope, and roughness.

The intercept term, \(\beta_0\) represents the baseline log-intensity of occurance per unit area. The coefficients \(\beta_i\) represent the expected increase in log-intensity for a 1 unit increase in the corresponding variable (aspect, roughness, slope), holding all others constant. \(\beta_i\) with large magnitude represent strong underlying relationships with terrain covariates for a given species, while \(\beta_i\) close to 0 indicate that the species is relatively unaffected by the terrain covariate. Since the relationship between terrain covariates and intensity is log-linear, coefficients must be interpreted differently from coefficients when the relationship is linear. Rather than an additive relationship, coefficients in log-linear regression are multiplicative. The exponent of each coefficient represents the relative change to the baseline intensity when the corresponding covariate increases by 1. \((\exp{\beta_i} - 1)*100\) represents the impact of the corresponding covariate in terms of the percentage change to the baseline intensity. For example, if \(\beta = 0.5\), then \((\exp{0.5} - 1)*100 = 64.87\) indicates a 64% increase in intensity, relative to the baseline intensity, for a one unit increase in the corresponding covariate, holding all others constant.

The model coefficients then represent estimates of the relationship between a species presence at a given location with observed terrain characteristics. For the context of understanding the individual terrain preferences of a species, the magnitude of the estimated coefficients represent the relative importance of a particular terrain covariate. For example, a species with a large estimated coefficient on aspect is more likely to grow on a north facing slope. To identify species which will grow well on any given terrain, species with low-magnitude coefficients should be prioritized.

2.2 Limitations

There is likely some large violation of the assumption of point-wise independence. From visual investigation of flower observations, there is evidence of some clustering, especially with respect to those species with limited observations. We recommend limiting inferences from this modeling approach to species with many observations that don’t appear to violate the assumption of independence. Although not considered here, Ripley’s K-function can quantify the degree of clustering for a given species.

The assumption of a log-linear intensity with respect to terrain covariates limits the intensity to be exponential and uni-directional as a function of a given terrain covariate. This means that for a given species, the relationship between a particular terrain covariate and the estimated intensity can only increase or decrease in an exponential form. Considering that a species may have an “optimal” terrain type, future work should aim to fit a similar model that includes polynomial terrain coefficients, for example, \(roughness^2\). The introduction of polynomial predictors will allow for bi-directional intensities as a function fo particular terrain covariates.

Additionally, interaction terms were not considered in these models, although the true existence of interaction effects may exist. For example slope may mediate the relationship of intensity with roughness. That is, a species may grow poorly on rough terrain with a low slope, but may grow particularly well on rough terrain with a high slope. Future modeling should consider the existance of interaction effects between terrain covariates.

3 Results

3.1 Spatial distribution of species

In the context of providing supplemental nutrition for parasitoids, highly-prevalent flowers are most important. The mean observation count with respect to each species was 489, and the median was 469. Therefore, species which were observed fewer than 50 times are relatively rare and would likely be poor candidates for nutritional supplementation, given their sparse natural occurrence. Additionally, species with low observation counts tended to cluster, violating the independence assumption of the poisson point process models. Here, we limit inferences from the poisson point process modeling approach to species with greater than 50 observations, but the results of the poisson-point process models should be further investigated for validity before drawing conclusions.

Intensity surface for Lactuca pulchella. Observed points do not align with the estimated intensity surface, strong evidence of clustering.

Figure 3.1: Intensity surface for Lactuca pulchella. Observed points do not align with the estimated intensity surface, strong evidence of clustering.

Intensity surface for Aster falcatus. Observed points align well with estimated intensity surface, little to no evidence of clustering.

Figure 3.2: Intensity surface for Aster falcatus. Observed points align well with estimated intensity surface, little to no evidence of clustering.

Figures 3.1 and 3.2 demonstrate poisson point process models which do and do not violate the assumption of independence, respectively. Coefficient estimates on terrain covariates may be misleading for species in which the assumption of independence was violated. Intensity plots for all species investigated are included in the appendix.

3.2 Temporal Flowering Patterns

For the consideration of a flower’s temporal occurrence, growing degree days (GDD) was considered in place of date. Since yearly climate fluctuations impact the relative timing of the growing season each year, GDD provides an alternative measure of timing within a growing season that is comparable between years. Growing degree days were calculated starting after the yearly “green-up” date, March 21st and March 18th in 2019 and 2020, respectively. For each day, the positive differences between the daily mean and a threshold temperature (\(0^\circ C\) and \(4^\circ C\)) was aggregated through time. The GDD measurement for a given date is the sum of all the previous positive differences (daily mean temperature - 0 or 4\(^\circ C\)) and the difference for the given day, if positive. Days with mean temperatures below the threshold do not contribute to GDD measurements.

In the context of supplemental nutrition for parasitoids, species which occur in limited temporal ranges (in terms of GDD) are less useful than species which occur across wide temporal ranges. Therefore, the investigation of temporal occurrences was limited to species observed on 2 or more survey dates. Figures 3.3 and 3.4 display the observed species occurrences against growing degree days with a threshold temperature of \(0^\circ C\) for 2019 and 2020, respectively. Between survey dates, flower occurrences were interpolated with a linear trend for easier observation of temporal trends. Plots using a growing degree day threshold of \(4^\circ C\) are included in the appendix.

Plots of species occurrence in 2019 vs GDD using a threshold temperature of 0 degrees

Figure 3.3: Plots of species occurrence in 2019 vs GDD using a threshold temperature of 0 degrees

Plots of species occurrence in 2020 vs GDD using a threshold temperature of 0 degrees

Figure 3.4: Plots of species occurrence in 2020 vs GDD using a threshold temperature of 0 degrees

Notably, some species which were observed on two or more survey dates in 2019 were not observed on at least two survey dates in 2020 and vice-versa. This discrepancy highlights a high degree of unmeasured inter-annual variability in the growing conditions necessary for each species to flower. For accurate insights as to the temporal occurrence of these species, it may be necessary to consider other sources of information (e.g. seasonal rainfall), or conduct further surveys in future growing seasons.

4 Spatially and Temporally Relevant Species

Between the spatial and temporal considerations of species, two “filters” were applied in terms of relevance. First, spatial inferences were limites to those species which were observed at 50 or more sites throughout the survey area. Second, temporal considerations were limited to those species which were observed on two or more survey dates in either 2019 or 2020. These constraints limit the number of species considered for supplemental parasitoid nutrition to 21. Table 1 presents these species and relevant information including the spatial covariate coefficients on a percentage scale. These coefficients should be interpreted as the percent change to the baseline expected rate of occurrence per unit area for a one unit increase in the relevant covariate. For example, Artemisa cana has an expected baseline rate of 0.0028245 (X.Intercept.) flowers per unit area when aspect, roughness, and slope are 0. However, holding the other covariates constant, for a site with an aspect of 1 (North facing), the expected rate of Artemisa cana occurrences decreases by 3.87%. Similarly, with all other covariates held constant a 1 unit increase in roughness decreases the expected rate of Artemisa cana occurances by 4.22%. Finally, holding all other covariates constant, a 1 unit increase in slope increases the expected rate of Artemisa cana occurances by 7.81%.

Spp site_count_total X.Intercept. aspect roughness slope
Sphaeralcea coccinea 217 0.00 -25.17 23.90 -11.13
Symphoricarpos albus 54 0.00 -8.84 1519.35 -39.42
Tragopogon dubius 819 0.01 -6.43 -38.63 11.90
Artemisia cana 455 0.00 -3.87 -4.22 7.81
Lactuca serriola 91 0.00 11.40 45.92 1.50
Medicago sativa 739 0.01 15.28 -49.12 1.99
Gaura coccinea 188 0.00 34.33 -14.04 23.25
Melilotus officinalis 596 0.00 52.57 -31.06 11.08
Achillea millefolium 818 0.00 67.01 6.59 6.97
Gutierrezia sarothrae 158 0.00 77.05 -31.13 23.10
Pediomelum argophyllum 469 0.00 114.05 -9.18 23.74
Vicia americana 736 0.01 137.02 -50.60 -1.12
Solidago mollis 156 0.00 204.42 274.47 -9.26
Rosa arkansana 159 0.00 226.96 -30.74 52.11
Lygodesmia juncea 221 0.00 276.67 -62.11 25.09
Heterotheca villosa 776 0.00 277.73 -12.59 25.33
Ratibida columnifera 244 0.00 286.43 -14.77 28.02
Dalea purpurea 158 0.00 347.19 51.44 23.68
Gaillardia aristata 89 0.00 2016.34 -65.45 66.70
Linum lewisii 262 0.00 2526.97 -71.57 68.20
Campanula rotundifolia 51 0.00 5701.12 -5.66 81.35

Table 1: Table of relevant species information including whether or not it is native or produces extra-floral nectar, as well as the number of observations within the survey area, and the poisson point process coefficients in terms of the percent change in expected rate of occurance for a one unit increase in the corresponding covariate, holding other covariates constant.

Between the growing degree day plots and Table 1, candidate species may be identified for potential utility parasitoid nutritional supplementation based on when they tend to flower in terms of GDD and their terrain preferences.

5 Appendix

6 Packages used

The R package bookdown (Xie, 2023) was used to create this report document using the R language (R Core Team, 2021). In addition, the following were packages used for the analysis and/or formatting of this document:

7 References

Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., … Iannone, R. (2022). Rmarkdown: Dynamic documents for r. Retrieved from https://CRAN.R-project.org/package=rmarkdown
Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial point patterns: Methodology and applications with R. Retrieved from https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/p/book/9781482210200/
Baddeley, A., Turner, R., & Rubak, E. (2023). Spatstat: Spatial point pattern analysis, model-fitting, simulation, tests. Retrieved from http://spatstat.org/
Banerjee, S., Carlin, B. P., & Gelfand, A. E. (2014). Hierarchical modeling and analysis for spatial data. CRC press.
Fox, J., Weisberg, S., Price, B., Friendly, M., & Hong, J. (2022). Effects: Effect displays for linear, generalized linear, and other models. Retrieved from https://CRAN.R-project.org/package=effects
Luo, D., Ganesh, S., & Koolaard, J. (2022). Predictmeans: Predicted means for linear and semi parametric models. Retrieved from https://CRAN.R-project.org/package=predictmeans
Müller, K., & Wickham, H. (2021). Tibble: Simple data frames. Retrieved from https://CRAN.R-project.org/package=tibble
Phillips, N. (2017). Yarrr: A companion to the e-book "YaRrr!: The pirate’s guide to r". Retrieved from www.thepiratesguidetor.com
Pinheiro, J., Bates, D., & R-core. (2021). Nlme: Linear and nonlinear mixed effects models. Retrieved from https://svn.r-project.org/R-packages/trunk/nlme/
Pruim, R., Kaplan, D. T., & Horton, N. J. (2021). Mosaic: Project MOSAIC statistics and mathematics teaching utilities. Retrieved from https://CRAN.R-project.org/package=mosaic
R Core Team. (2021). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org/
Wickham, H., & Bryan, J. (2022). Readxl: Read excel files. Retrieved from https://CRAN.R-project.org/package=readxl
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., … Dunnington, D. (2022). ggplot2: Create elegant data visualisations using the grammar of graphics. Retrieved from https://CRAN.R-project.org/package=ggplot2
Wickham, H., François, R., Henry, L., & Müller, K. (2022). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr
Wickham, H., Hester, J., & Bryan, J. (2022). Readr: Read rectangular text data. Retrieved from https://CRAN.R-project.org/package=readr
Wilke, C. O. (2021). Ggridges: Ridgeline plots in ggplot2. Retrieved from https://wilkelab.org/ggridges/
Xie, Y. (2022). Knitr: A general-purpose package for dynamic report generation in r. Retrieved from https://yihui.org/knitr/
Xie, Y. (2023). Bookdown: Authoring books and technical documents with r markdown. Retrieved from https://CRAN.R-project.org/package=bookdown
Zhu, H. (2021). kableExtra: Construct complex table with kable and pipe syntax. Retrieved from https://CRAN.R-project.org/package=kableExtra